navigation step
2e5c2cb8d13e8fba78d95211440ba326-Supplemental.pdf
Finally, Section E illustrates qualitative results. We present the encoder-decoder variant of HAMT in fine-tuning on the right of Figure 1. Compared to the original cross-modal transformer on the left, the variant removes text-tovision cross-modal attention. The encoder encodes the texts to obtain textual embeddings. Theoriginal target location is viewed as a middle stop point.
Narrowing the Gap between Vision and Action in Navigation
Zhang, Yue, Kordjamshidi, Parisa
The existing methods for Vision and Language Navigation in the Continuous Environment (VLN-CE) commonly incorporate a waypoint predictor to discretize the environment. This simplifies the navigation actions into a view selection task and improves navigation performance significantly compared to direct training using low-level actions. However, the VLN-CE agents are still far from the real robots since there are gaps between their visual perception and executed actions. First, VLN-CE agents that discretize the visual environment are primarily trained with high-level view selection, which causes them to ignore crucial spatial reasoning within the low-level action movements. Second, in these models, the existing waypoint predictors neglect object semantics and their attributes related to passibility, which can be informative in indicating the feasibility of actions. To address these two issues, we introduce a low-level action decoder jointly trained with high-level action prediction, enabling the current VLN agent to learn and ground the selected visual view to the low-level controls. Moreover, we enhance the current waypoint predictor by utilizing visual representations containing rich semantic information and explicitly masking obstacles based on humans' prior knowledge about the feasibility of actions. Empirically, our agent can improve navigation performance metrics compared to the strong baselines on both high-level and low-level actions.
Predictably Smart
It's easy to feel like new ML technologies for us to rethink everything about UX design, but that's not quite true. The emergence of ML doesn't change the fact that the most usable, delightful UIs are those that embody principles of good design--like habituation--that many designers and researchers (Don Norman, Jakob Nielsen, Steve Krug, and Jeff Johnson to name a few) have been writing about for years. Evaluating recommendations or visually searching the interface for content counts as a navigation step, just like a tap or click. No ML-based suggestion will be "helpful" enough to offset breaking your user's flow state and muscle memory. But if you're confident that the user has a more open-ended goal like exploration, you have more leeway to put dynamic, ML-based features at the forefront of your UI.
Multi-Select Faceted Navigation Based on Minimum Description Length Principle
He, Chao (Chinese Academy of Sciences) | Cheng, Xueqi (Chinese Academy of Sciences) | Guo, Jiafeng (Chinese Academy of Sciences) | Shen, Huawei (Chinese Academy of Sciences)
Faceted navigation can effectively reduce user efforts of reaching targeted resources in databases, by suggesting dynamic facet values for iterative query refinement. A key issue is minimizing the navigation cost in a user query session. Conventional navigation scheme assumes that at each step, users select only one suggested value to figure out resources containing it. To make faceted navigation more flexible and effective, this paper introduces a multi-select scheme where multiple suggested values can be selected at one step, and a selected value can be used to either retain or exclude the resources containing it. Previous algorithms for cost-driven value suggestion can hardly work well under our navigation scheme. Therefore, we propose to optimize the navigation cost using the Minimum Description Length principle, which can well balance the number of navigation steps and the number of suggested values per step under our new scheme. An emperical study demonstrates that our approach is more cost-saving and efficient than state-of-the-art approaches.